Introduction to Statistics

Bennett Kleinberg

Week 10

Week 10

Statistical power

  • Part 1: What is statistical power?
  • Part 2: How do we calculate statistical power?

Part 1: What is statistical power?

Back to week 4

Two kinds of errors: Type 1 errors and Type 2 errors

Type 1 errors

Analogy: false positives

We conclude there is a difference (an effect), but it’s a false alarm (in reality there is no effect).

In hypothesis terms: we reject the null but shouldn’t have done so.

Type 1 errors

We want to keep that error low.

i.e. we want to be quite sure that there is an effect.

This is all contained in the alpha level: under the null, a proportion of exactly \(\alpha\) lies in the critical region.

For \(\alpha=0.01\), 1% of the values under the null lie is that area.

Thus: in 1% of the cases, will we incorrectly conclude that there is an effect.

Today: Type 2 errors

Analogy: missed effects.

We conclude that there is no difference, but in reality there is one (i.e. we miss the effect).

In hypothesis terms: we fail to reject the null hypothesis although we should have done so.

This error term is called \(\beta\).

Inference errors

  • Type I errors: we keep these low by setting \(\alpha\) low
  • Type II errors: we want these low as well!

But there’s no free lunch in statistics!

Statistical power

  • the Type II error is the failure to reject the null hypothesis if we should have done so
  • the probability of this error is called \(\beta\)

The statistical power of a test is \(1-\beta\).

Power and \(\beta\)

Statistical power

Another way of understanding statistical power:

Statistical power is the the probability that a (hypothesis) test will correctly reject \(H_0\)

Graphical explanation

  • suppose we test the IQ score:
    • the IQ scores are distributed normally with \(\mu = 100\) and \(\sigma = 15\)
    • we now give a sample of \(n=20\) 3 cups of espresso before they take the IQ test
    • suppose the espresso trick is pure magic: it leads to a full shift in +0.50 SD (7.5 points)

\(H_0: \mu= 100\)

\(H_0\) distribution

Espresso trick distribution

Both together

Stepwise

  1. define alpha as \(\alpha = .05\)
  2. one-sided critical z-value: \(z=1.65\)
  3. translates to \(1.65 = \frac{M-100}{\sigma_M} \leftrightarrow 1.65 = \frac{M-100}{3.35} \leftrightarrow M = 105.53\)

So we know that the critical region starts at \(M=105.53\) (for \(n=20\))

\(\alpha\)

Locating the errors

  • we can now say that “green area” = critical region where we reject \(H_0\) with \(n=20\)
  • i.e. “green” = \(\alpha\)
  • so we can also say where \(\beta\) is

\(\alpha\) and \(\beta\)

Locating the errors

  • we can now say that “green area” = critical region where we reject \(H_0\) with \(n=20\)
  • i.e. “green” = \(\alpha\)
  • so we can also say where \(\beta\) is:
    • \(\beta\) [=“blue”] is the area (probability) where we fail to reject \(H_0\) although we should have!

Bringing it all together

  • if we know the probability of \(\alpha\), then we know \(1-\alpha\) under the null
  • and if we know \(\beta\), then we know \(1-\beta\)

\(1-\alpha\)

\(1-\beta\)

Bringing it all together

  • the “lightblue” area is \(1-\beta\) = statistical power

“So if we want to increase power [=lightblue], why don’t we just make \(\beta\) [=darkblue] smaller?”

The relationship of \(\alpha\) and \(\beta\)

  • the boundary of \(\alpha\) for \(H_0\) is also
  • the boundary of \(\beta\) for \(H_A\)

Less strict \(\alpha\)

Stricter \(\alpha\)

Always a compromise!

  • if we make \(\alpha\) stricter (= decreasing), we increase \(\beta\), so we decrease the statistical power \(1- \beta\)
  • if we make \(1-\beta\) higher (= increasing), we decrease \(\beta\), so we increase the Type I error \(\alpha\)

Two solutions

  • increasing sample size \(n\)

From \(n=20\) to \(n=40\)

From \(n=20\) to \(n=100\)

Two solutions

  • increasing sample size \(n\)
  • larger effects

Remember Cohen’s d?

  • \(d=\frac{\mu_{treatmemt} - \mu_0}{\sigma} = \frac{107.50 - 100}{15} = 0.5\)

What if we doubled \(d\)?

From \(d=0.5\) to \(d=1.0\)

Factors that matter

  • Statistical power increases if we:
    • increase \(n\)
    • increase the effect size of interest
    • increase \(\alpha\)
  • Statistical power decreases if we:
    • decrease \(n\)
    • decrease the effect size of interest
    • decrease \(\alpha\)

Part 2: How do we calculate statistical power?

Our example

  • IQ scores that are distributed normally with \(\mu = 100\) and \(\sigma = 15\)
    • we now give a sample of \(n=20\) 3 cups of espresso before they take the IQ test
    • suppose the espresso trick is pure magic: it lead to a full shift in +0.50 SD (7.5 points)

Steps to calculate power

  1. Critical region under \(H_0\)
  2. Region in \(H_A\) “beyond” the critical region of \(H_0\)

Critical region

  1. for \(\alpha = .05\)
  2. one-sided critical z-value: \(z=1.65\)
  3. translates to \(1.65 = \frac{M-100}{\sigma_M} \leftrightarrow 1.65 = \frac{M-100}{3.35} \leftrightarrow M = 105.53\)

This is the value under \(H_0\) which demarcates the critical region of “statistical significance”

Any \(M > 105.53\) means we reject \(H_0\).

Statistical power is about \(H_A\):

  • so we need the probability under \(H_A\) of values that are larger than the critical value of \(H_0\)

Calculating power

[= lightblue]

  • probability under \(H_A\) that is larger than the critical value of \(H_0\) (i.e. 105.53)

\(z=\frac{M-\mu}{\sigma_M} = \frac{105.53-107.50}{3.35} = -0.59\)

Thus we know that 105.53 in \(H_A\) corresponds to \(z=-0.59\).

The power is thus the body of the distribution!

Table lookup

  • For \(z=-0.59\):
  • proportion in tail = 0.2776
  • proportion in body = 0.7224

The statistical power here is 0.7224.

We had a 72.24% chance of rejecting \(H_0\) if we needed to.

Another example

  • IQ score \(\sim N(100, 15)\)
  • Brain food promises an increase of \(d=0.8\)

What is the achieved statistical power for \(n=40\) and \(\alpha=.01\)?

Steps

  1. Critical value under \(H_0\)?

Needed: tail probability of \(p = .01\) –> \(z=2.32\)

Steps

  1. Value that corresponds to critical z:

\(2.32 = \frac{M-100}{\sigma_M}\) with

  • \(\sigma_M = \frac{\sigma}{\sqrt{n}} = \frac{15}{\sqrt{40}} = 2.37\)

So: \(2.32 = \frac{M-100}{2.37} \leftrightarrow M = 105.50\)

Steps

  1. Obtaining statistical power
  • probability under \(H_A\) that is larger than the critical value of \(H_0\) (here: 105.50)

For this we need to know a bit more about \(H_A\)

Steps

We need the mean of \(H_A\):

  1. Mean of \(H_A\)
  • We know that \(d=0.8 \leftrightarrow 0.8 = \frac{M-100}{15} \leftrightarrow M = 112\)

Cohen’s d of 0.8 translates to an IQ of 112.

Steps

Back to 3 again:

  • probability under \(H_A\) that is larger than the critical value of \(H_0\) (here: 105.50)

\(z=\frac{105.50-112}{2.37} = \frac{-6.50}{2.37} = -2.74\)

Exact power

We know that the power is the body proportion (and corresponding probability), so:

Power = .9969

All in one plot

In the live session

  • power examples by hand and step-by-step
  • additional example on CIs
  • formulas clarification

Recap

  • the relationship between inference error types (Type I and Type II)
  • the relationship between power and sample size, effect size and alpha
  • calculating power by hand

Next week

Correlation